abstract

C++ dev tools
clang tools

C++ dev tools

why we need tools

C makes it easy to shoot yourself in the foot;
C++ makes it harder, but when you do it blows your whole leg off.

-- Bjarne Stroustrup

C++ is a powerful language, but it is also a complex language, so it is easy to import bad smell code into cur project.
We can keep our code high quality by code review, but it is not efficient and not enough.
CR is a human work, it cost too many time and energy to find some simple or hide bugs.
So we need some tools to help us to find the simple bugs automatically and bring our attention to the important && interesting things.

what is good code

easy to read
easy to maintain
easy to use
work as expected
work fast

tools help us write good code

formatter
code generator
code analyzer
code refactor tools
test framework
benchmark (test) framework

formatter

why we need formatter

ensure consistent code style accross the code data base
reduce time on code style discussions
keep CR reviewer focus on logic
automatically format code to save time

when we format code

save file
pre-commit hook
CI

formatter tools list

clang-format
astyle
…

code generator

why we need code generator

reduce time on boring work
keep code style guidelines
use common design patterns
avoid human error

when we use code generator

generate implementation codesnippet from a interface
generate ctor && dtor && copy ctor && move ctor && copy assignment && move assignment
generate getter && setter member function
generate implementation codesnippet from a proto
…

code generator tools list

IDES(Visual Studio, CLion, …)
protoc
…

code analyzer

why we need code analyzer (linter)

find undefined behavior && potential bugs automatically
find bad smell code automatically
keep code style guidelines (modernize, readability, performance, …)
…

static analyzer

Build warnings
Other tools
- clang-tidy
- coverity
- cppcheck
- …

dynamic analyzer

valgrind
address sanitizer
…

code refactor tools

what is refactor

Basic set
- rename
- extract function
- …
Profound set
- change function signature
- push/pull data member up/down in class hierarchy
- modernize
- …

why we need refactor tools

Maybe you will say, we can do refactor by hand or use regex, why we need tools to do it?

Just think about rename:

// example for confusing names
Struct stat stat; // stat is a struct name, but also a variable name
stat("file", &stat); // stat is a function name
printf("%d", stat.size);

If we want to rename struct name ‘stat’ to ‘Mystat’, how can we do it?
We can use refactor tools such like clang-rename to do it, thanks clangParse, clangSema, clangAST and many other tools did the hard work.

code refactor tools list

IDES(Visual Studio, CLion, …)
clangRefactor
clangMR(MapReduce)
…

test framework

Every one know test is important, so I will not talk about why we need test.

test framework list

Google Test
Boost.Test
…

benchmark (test) framework

what is benchmark

Example 1

For example, we want to compare the performance of two implements.
First one use std::unordered_map to store the data, and second one use std::map to store the data.
We know std::unordered_map is faster than std::map, but we don’t know how much find faster when we use them store 10000 int and with a O2 optimization.
So we write benchmark code to compare them.

benchmark_result

#include <benchmark/benchmark.h>
#include <unordered_map>
#include <map>
#include <random>

int RandomNumber() { 
    static std::random_device rd;
    static std::mt19937 gen(rd());
    static std::uniform_int_distribution<> dis(1, 1000000);
    return dis(gen);
}

// Benchmark for std::unordered_map
static void BM_UnorderedMap_Read(benchmark::State& state) {
    std::unordered_map<int, int> unordered_map;
    for (int i = 0; i < 10000; ++i) {
        int num = RandomNumber();
        unordered_map[num] = num;
    }

    for (auto _ : state) {
        for (int i = 0; i < 10000; ++i) {
            benchmark::DoNotOptimize(unordered_map.find(RandomNumber()));
        }
    }
}
BENCHMARK(BM_UnorderedMap_Read);

// Benchmark for std::map
static void BM_Map_Read(benchmark::State& state) {
    std::map<int, int> map;
    for (int i = 0; i < 10000; ++i) {
        int num = RandomNumber();
        map[num] = num;
    }

    for (auto _ : state) {
        for (int i = 0; i < 10000; ++i) {
            benchmark::DoNotOptimize(map.find(RandomNumber()));
        }
    }
}
BENCHMARK(BM_Map_Read);

BENCHMARK_MAIN();

Example 2

We want to use std::string_view instead of const std::string& to pass string parameter to a function.
But how much performance improvement we can get?
Suppose we use both const char* and std::string to pass a string parameter to a function.
We write benchmark code to compare them.

benchmark_result

#include <benchmark/benchmark.h>
#include <string>
#include <string_view>
#include <vector>
#include <random>

// Function that generates random strings
std::string GenerateRandomString(size_t length) {
    const std::string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    std::random_device random_device;
    std::mt19937 generator(random_device());
    std::uniform_int_distribution<> distribution(0, chars.size() - 1);

    std::string random_string;
    for (size_t i = 0; i < length; ++i) {
        random_string += chars[distribution(generator)];
    }

    return random_string;
}

// Function that takes std::string_view
void FunctionWithStringView(std::string_view str) {
    benchmark::DoNotOptimize(str.data());
}

// Function that takes const std::string&
void FunctionWithStringRef(const std::string& str) {
    benchmark::DoNotOptimize(str.data());
}

// Benchmark for std::string_view with const char*
static void BM_StringViewWithChar(benchmark::State& state) {
    std::string str = GenerateRandomString(100);
    const char* cstr = str.c_str();

    for (auto _ : state) {
        FunctionWithStringView(cstr);
    }
}
BENCHMARK(BM_StringViewWithChar);

// Benchmark for const std::string& with const char*
static void BM_StringRefWithChar(benchmark::State& state) {
    std::string str = GenerateRandomString(100);
    const char* cstr = str.c_str();

    for (auto _ : state) {
        FunctionWithStringRef(cstr);
    }
}
BENCHMARK(BM_StringRefWithChar);

// Benchmark for std::string_view with std::string
static void BM_StringViewWithString(benchmark::State& state) {
    std::string str = GenerateRandomString(100);

    for (auto _ : state) {
        FunctionWithStringView(str);
    }
}
BENCHMARK(BM_StringViewWithString);

// Benchmark for const std::string& with std::string
static void BM_StringRefWithString(benchmark::State& state) {
    std::string str = GenerateRandomString(100);

    for (auto _ : state) {
        FunctionWithStringRef(str);
    }
}
BENCHMARK(BM_StringRefWithString);

BENCHMARK_MAIN();

why we need benchmark

performance compare: compare the performance of algorithms || data structures || implementations
performance optimization: identify the bottleneck
performance regression: ensure the performance is not worse than before

benchmark framework tools

Google Benchmark
Boost.Test
https://quick-bench.com/
…

clang tools

clang family

clang: C, C++, Objective-C and Objective-C++ compiler
clang-format: code formatter
clang-tidy: code analyzer && code refactor tools
clang-refactor: code refactor tools
clangd: language server, support code completion, go to definition, find references, rename, …
…

precondition when we use clang tools

Compile your source code with clang, make sure they can be compiled successfully to a object file.
And then we can use clang tools to analyze && refactor our code.
Plz take care: only need compile with clang :-), no need to use or deploy the outputs.

what can we do with clang tools

write our own checkers

For example, our code data base was fulled with some bad smell code, but they can pass all the clang-tidy checkers and compiled successfully without any warnings.

// slowloop.cc
const char* str = GetSomeData();

// If compiler dont optimize strlen(str) to a const value, this loop will run as O(n^2)
// But dont generate any warnings
for (int i = 0; i < strlen(str); ++i) {
    // do something
}

We can use clang-AST to make sure what happened when clang compile this code.

1	clang++ -Xclang -ast-dump -fsyntax-only slowloop.cc

And We can get a output AST such like this:

|   `-ForStmt 0x9759b40 <line:6:3, line:8:3>                                                                           // for(
|     |-DeclStmt 0x97599a8 <line:6:8, col:24>
|     | `-VarDecl 0x9759930 <col:8, col:23> col:15 used index 'size_t':'unsigned int' cinit                            // size_t index = 0;
|     |   `-ImplicitCastExpr 0x9759998 <col:23> 'size_t':'unsigned int' <IntegralCast>
|     |     `-IntegerLiteral 0x9759970 <col:23> 'int' 0
|     |-<<<NULL>>>                                                                                                     // i < strlen(str);
|     |-BinaryOperator 0x9759ae8 <col:26, col:44> 'bool' '<'
|     | |-ImplicitCastExpr 0x9759ad8 <col:26> 'size_t':'unsigned int' <LValueToRValue>
|     | | `-DeclRefExpr 0x97599c0 <col:26> 'size_t':'unsigned int' lvalue Var 0x9759930 'index' 'size_t':'unsigned int'
|     | `-CallExpr 0x9759aa8 <col:34, col:44> 'size_t':'unsigned int'
|     |   |-ImplicitCastExpr 0x9759a98 <col:34> 'size_t (*)(const char *) __attribute__((cdecl))' <FunctionToPointerDecay>
|     |   | `-DeclRefExpr 0x9759a38 <col:34> 'size_t (const char *) __attribute__((cdecl))':'size_t (const char *)' lvalue Function 0x92edfb8 'strlen' 'size_t (const char *) __attribute__((cdecl))':'size_t (const char *)'
|     |   `-ImplicitCastExpr 0x9759ac8 <col:41> 'const char *' <LValueToRValue>
|     |     `-DeclRefExpr 0x9759a18 <col:41> 'const char *' lvalue Var 0x9759890 'str' 'const char *'
|     |-UnaryOperator 0x9759b20 <col:47, col:49> 'size_t':'unsigned int' lvalue prefix '++'                             // ++i
|     | `-DeclRefExpr 0x9759b00 <col:49> 'size_t':'unsigned int' lvalue Var 0x9759930 'index' 'size_t':'unsigned int'
|     `-CompoundStmt 0x9759b30 <col:56, line:8:3>

Finally, we can write our own checkers(patterns) to find this kind of bad smell code.

1	forStmt(hasCondition(hasDescendant(callExpr(callee(functionDecl(hasName("strlen")))))))

For more info, u can see